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Abstract. In this paper we present a randomized algorithm for computing the collec¬ 
tion of maximal layers for a point set in (k = f{n)). The input to our algorithm 
is a point set P = {pi, ...,pn} with pi G E^. The proposed algorithm achieves a run¬ 
time of O (^kn^~ (^+ fc+i) log TiJ when P is a random order and a runtime of 

logn) for an arbitrary P. Both bounds hold in expectation. Addi¬ 
tionally, the run time is bounded by O(fcn^) in the worst case. This is the first non-trivial 
algorithm whose run-time remains polynomial whenever /(n) is bounded by some polyno¬ 
mial in n while remaining sub-quadratic in n for constant k. The algorithm is implemented 
using a new data-structure for storing and answering dominance queries over the set of 
incomparable points. 
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1 Introduction 

The problem of finding the maximal layers of a set P = {pi, ...,p„} of n pointfl^ in [0,1]^ (where 
k = f{n)) is analogous to the problem of finding the convex layers of P. Given P its first maximal 
layer is defined to be the set Ati of points q £ P such that for any other p £ P, p )/- q. Here, 
is any ordering relation between two points. For example, we could define as follow: p P q 
if p[j] > qIJ] (where p[j] is the coordinate of p) for all j. The first maximal set Mi, which 
we simply refer to as the maximal set of P, has been well studied [HE]. The maximal layer 
Ml is recursively defined as the first maximal layer of remainder of P upon removing from P 
all the elements of layers from 1 to 1 — 1. Note that Mi could be empty. The maximal layers 
problem is to identify all the non-empty maximal layers of P and report them. We shall denote 
this problem as MaxLayers(P). 


Related Work. We only have a tight bound for MaxLayers(P) when fc < 3, which is 
6>(nlogn) |6ll2j . However, we do not have any improved lower bound when fc > 3. For fixed 
fc > 3 best known upper-bound is 0{n{logn)^~^) |3]. Interestingly, the upper bound to find only 
the first maximal set is 0(n(logBoth of these bounds hold in the worst-case. We 
see that, for fixed k, these algorithms can be regarded as almost optimal, as they only have a 
poly-logrithmic overhead over the theoretical lower bound. Conceptually, they implement multi¬ 
dimensional divide and conquer algorithms |7] on input P which introduces the poly-log factor 
in their runtimes. The point set P is partitioned into subsets based on ordering of points in some 
arbitrary dimension. Then the maximal sets are computed recursively and merged later. 

Things get interesting if the number of dimensions is not bounded by a constant. When 
k = f2(\ogn), these poly-logarithmic upper-bounds above becomes quasi-polynomial (in n). 
However, there is a trivial algorithm (which compares each point against the other, and keeps 
track of the computed transitive relations) that requires in the worst case O(fcn^) comparisons. 
Although, for finding only the first maximal layer, m proposed a deterministic algorithm that 
runs in 0(71^^+*^)/^) when k = n. Where, 0(n'^) is the complexity of multiplying two n x n 

^ We have restricted the sampling set to [0,1]*’ in order to simplify our analysis, the results hold for 
any arbitrary compact subset of E^. 
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matrices. So we see that the algorithm runs in uj(n'^) time. In a recent paper |TT|. authors show 
that for determining whether there exists a pair {u,v), where u € A and v G B (A and B are 
both sets of vector of size 0{n)) such that u >- v, can be done in sub-quadratic time provided 
k = O(logn). 

Our Results. In this paper we propose a randomized algorithm for the MaxLayers(P) prob¬ 
lem. When the point set P is also a random order the runtime of our algorithm is bounded 
by O (i+v+r) Ipgn^ in expectation. Otherwise, it is (^“i))/2 logn) 

also in expectation. Additionally, it takes 0{kn^) time in the worst case. This is the first non¬ 
trivial algorithm for which the following two conditions holds simultaneously: 1) The worst case 
run time is polynomially bounded (in n) as long as k is bounded by some polynomial in n. 
2) Whenever fc is a constant the run time of the proposed algorithm is sub-quadratic in n (in 
expectation). 

2 Preliminaries 

We denote P = {pi, ...,p„} as the input set of n points in E^. The coordinate of a point p 
is denoted as p[j]. For any points p,q G P, we define an ordering relation such that p P q 
if p[j] op q\j\ Vj G [l...k]. Where, op is a place holder for > or <. Consequently, there are 2^ 
different ordering relations and for each such an ordering there is a unique set of maximal layers 
(of P). Without loss of generality, we assume that op is > for all j, in this paper. Henceforth 
we will simply use > in place of op. We will use the notation S p, where S' is a set of 
incomparable elements, to denote that 3q G S such that q>- p.li S pwe say that S is “above” 
p. Furthermore, if p ^ <7 then either p = q or p q. 

Clearly, (P, >-) defines a partial order. We shall simply use P to denote this poset when the 
context is clear, li p >- q then we say that p precedes (or dominates) q in the partial order and 
that they are comparable. We say that p and q are incomparable (denoted by p || g) if p ^ g and 
q ^ p. If p and q belong to the same maximal layer then p || q. Let the height h of P be defined 
as the number of non-empty maximal layers of P. We also define the width re of P as the size 
of the largest subset of P of mutually incomparable elements. Note, that the maximum size of 
any layer is < ic. 

Let O : X —>• {0,1}^, such that 0{p,q)[j] = 1 if p[j] < q[j] and 0 otherwise. This 

definition, which might seem inverted, will make sense when we discuss it in the context of our 
data structures. We call O the orthant function as it computes the orthant with origin p in which 
q resides. Henceforth, the maximal layers will simply be referred to as layers. Let T be a linear 
ordering of the points in P such that for any p^q G P, A p >- q then p precedes q in T, that is, a 
liner-extension preserves the precedence relations between the elements of P. Let denote the 
size of the set S. We are now ready to state the MAxL AYERS (P) problem formally: 

Definition 1 (MaxLayers(P)). Given a point set P along with an ordering relation defined 
above, label each point in P with rank of the maximal layer it belongs to. 

In our analysis we shall use the typical RAM model, where operation of the form p[j] > ^[j]? 
takes constant time. In our analysis we shall first assume that the point set P forms a random 
order. Then we will extend the result for an arbitrary set of points. Below we define random 
orders formally according to its definition in [T]. 

Definition 2. We piek a set of n points uniformly at random from [0,1]^. Then the partial 
order generated by these points is a random order. 

This is equivalent to saying that (P, p) is the intersection of k linear orders Ti x ... x 
where the fc-tuple (Pi, ...,Pfc) is chosen uniformly at random from (n!)^ such tuples. Here, each 
Tj is a linear ordering (permutation) of {1,2, ...,n}. Whenever we present our run-time results 
in terms of w or (and) h it is assumed that both are upper bounded by n, the number of points 
in P. To simplify our analysis we ignore the expected values of w and h, which could only have 
made our results stronger (for example, see |ll2) i. 
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3 The Iterative Algorithm 

We shall use MaxPartition(P) as the main procedure for solving an instance of MaxLay- 
ERS(P). First we will describe a simpler algorithm and analyze it for a random order P. Then 
we extend it for an arbitrary set of points. 


3.1 Data Structures 

In this section we introduce the framework on which our algorithm is based. Let P be a self¬ 
balancing binary search tree (for example B could be realized as a red-black tree). Let B(i) 
be the node in the in-order of B. Each node of B stores three pointers. One for each of its 
children (null in place of an empty child) and another pointer which points to an auxiliary data 
structure. If X is a node in B then left and right children of X are denoted as 1{X) and r{X) 
respectively. We also denote by L{X) the auxiliary data structure associate with X. When the 
context is clear, we shall simply use L in place of L{X). 

We also let P be a placeholder for any data structure that can be used to store the set of 
points from a single layer of P. For example, L could be realized as a linked list. Additionally, 
L must support Insert(P,p) and Above(L,p). The Above(P,p) operation takes a query point 
p, and answers the query L pi . The Insert(P,p) operation inserts p into L, which assumes p 
is incomparable to the elements in L. So, we must ensure that L is the correct layer for p before 
calling Insert(P,p). 

We observe that the layers of P are themselves linearly ordered by their ranks from 1 to 
h. We can thus use B to store the layers in sorted order, where each node B{i) would store 
the corresponding layer Mi (using L{B{i))). We endow B with Insert(P,p) and Search(P,p) 
(we do not need deletion) operations. The Insert(P,p) procedure first calls the Search(P,p) 
procedure to identify which node B{i) of B the new point p should belong and then calls 
lNSERT(P(P(f)),p). If p does not belong to any layer currently in B then we create a new node 
in B. The Search(P,p) procedure works as follows: we can think of P as a normal binary search 
tree, where the usual comparison operator > has been replaced by the Above(P,p) procedure. 
Furthermore, the procedure can only identify whether L p oi L "i/- p. This is exactly equivalent 
to the situation where we have replaced the comparison operator > with >. So we must determine 
two successive nodes B{i) and B{i + 1) such that L{B{i)) >- p and L{B{i -\- 1)) ^ p. If such a 
pair of nodes does not exist then we return a null node. 

3.2 MaxPartition(P) 

We begin by first computing a linear extension T of P. We initialize B as an empty tree. We 
iteratively pick points from P in increasing order of their ranks in T and call Insert(P,p), 
where p is the current point to be processed. Insert(P,p) subsequently calls Search(P,p). We 
have two possibilities: 

CASE 1: Search(P,p) returns a non-empty node B{i). We then call Insert{L{B{ i),p). 

CASE 2: Search(P,p) returns a null node. Then we create the node B{m + 1) in B, where m 
is the number of nodes currently in B. We first initialize P(m -1-1) and then call lNSERT(L(P(m-|- 

1) ,p) on it. We note that, when we create a new node in B it must always be the right-most node 
in in-order of B. This follows from the order in which we process the points. Since p succeeds a 
processed point q in the linear extension T, hence p )/- q. Thus, if p does not belong to any of 
nodes currently in B then it must be the case that p is below all layers in B. 

MaxPartition(P) terminates after all points have been processed. At termination L{B{i)) 
stores all of the points in Mi ior 1 < i < h. We make a couple of observations here. 1) When a 
point is inserted into a node B{i) it will never be displaced from it by any point arriving after it. 

2) Since, nodes are always added as the right-most node in B, for Search(P,p) to be efficient, 
B must support self-rebalancing. 

If we assume that Above(P,p) and Insert(L,p) to work correctly, at once we see that 
Search(P,p) and Insert(P,p) are also correct. Hence, each point is correctly assigned to the 
layer it belongs to. 
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3.3 Runtime Analysis 

Let Above(L,p) take ta{\L\) time. As mentioned in section 2, \L\ < w for any layer in B. Hence, 
taiw) is an upper bound on the runtime of Above(L,p). Similarly, we bound the runtime of 
Insert(L,p) with ti{w). Let p be the next point to be processed. At the time B will have at 
most h nodes. In order to process p the lNSERT(i?,p) will be invoked, which in turn calls the 
Search(H,p) as discussed above. But the Search(H,p) will employ a normal binary search 
on B with the exception that at each node of B it invokes the Above(L,p) instead of doing a 
standard comparison. Since, B is self-balancing the height of B is bounded by 0{\ogh). Hence, 
number of calls to Above(L,p) is also bounded by 0{\ogh), each of which takes ta{w) time. 
Also, for each point p, Insert(L,p) is called only once. We also assume initializing a node in B 
takes constant time. So, processing oip takes 0{ta{w) \ogh + ti{w)) and this holds for any point. 

Lemma 1. We can compute a linear extension T of P in 0{n\ogn + kn) time in the worst 
case. 

Proof. We shall compute T as follows: Let p{p) = maxi<j<fc p[j\. Then sorting the points in 
decreasing order of /i(p) will give us T. It is trivial to see that T is a linear extension of P. This 
takes 0(n log n -I- kn) in the worst case. 

□ 

The reason for computing T in this way will be clear when we get to the analysis of our algo¬ 
rithm. Later we shall see that the time bounds for Above(L,p) and Insert(L,p) will dominate 
the time it takes to compute T. So we shall ignore this term in our run-time analysis. The next 
theorem trivially follows from the discussion above. 

Theorem 1. The procedure MaxPartition(P) takes 0{n{ta{w) log h + ti{w))) time and upon 
termination outputs a data structure consisting of the maximal layers of P in sorted order. 

4 Realization of L using Half-Space Trees 

In this section we introduce a new data structure for implementing L. We shall refer to it as 
Half-Space Tree (HST). 

The function 0{p,q) computes which orthant q belongs to with respect to p as the origin. 
Clearly, there are 2^ such orthants, each having a unique label in {0,1}*. Let Hj{p) be a half 
space defined as: Hj{p) = {q ^ [0,1]* | 0{p,q) = {0, I}*”-’} passing through origin p 

whose normal is parallel to dimension j. Here, {0,1}^“^0{0,1}*”-’ represents a 0-1 vector for 
which the component is 0. We shall use the notation hjijj) to denote the extremum orthant 
of Hj{p) (w.r.t y), that is, hj{p) = There are k such half spaces. An orthant whose 

label contains m I’s lies in the intersection of some k — m such half spaces. 

Lemma 2. If p,q € P and p || q then 0{p, q) G {0, 1}* \ {0^, 1*}. That is, q can only belong to 
orthants which lie in the intersection of at most k — 1 half spaces. 

Proof. Trivially follows from definitions. □ 

Corollary 1. The above lemma holds ifp and q belongs to the same layer. However, the converse 
of this statement is not true. 

4.1 Half-Space Tree 

We define a fc-dimensional HST recursively as follows: 

Definition 3 (HST). 1. A singleton node (root) storing a point p. 

2. A root has a number of non-empty children nodes (up to k) each of which is a HST. 

3. If node q is the child of node p then hj{p) ^ 0{p, q). 

An HST stores points from a single layer. So Corollary 1 tells us that for any node p and a 
new point q at most fc — 1 of the children nodes satisfy hj(p) ^ 0(p, q). Hence, q can be inserted 
into any one out of these children nodes. Henceforth, we will also use w (the width of (P, y)) to 
bound the number of points currently stored inside L. 
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Above(L,p) Let us assume that L is realized by an HST. The Above(L,p) works as follows: 
First we compute 0{r,p). Here, r is the root node. If 0{r,p) = 0^ then we return L >- p. 
Otherwise we call Above(j(L),p) recursively on each non-empty child node j of root r, such 
that hj{r) ^ 0(r,p). When all calls reach some leaf node, we stop and return L)/- p. 

of correctness. CASE 1: (L p) Let q be some point in L such that q >- p, prior to calling 
Above(L,p). Before reaching the node g, if we find some other node q' > p then we are done. 
So we assume this is not the case. We claim that p will be compared with q. We show this as 
follows: Let the length of path from root r to g be f -I- 1. Let uo,...,Ui be the sequence of nodes 
in this path (here uq = r and Ui = q). Since, q>- p, 0{um, q) ^ 0{um,p) for all 0 < m < z. But, 
Um is a predecessor node in the path from r to g, hence ^ 0{um, q) where Um+i is the 

jlh child of Um- Which implies hj^{um) ^ 0{um,p) (from transitivity of for 0 < m < z. Thus 
we will traverse this path at some point during our search. 

CASE 2:{L p) Follows trivially from the description of Above(L,p). □ 

Insert(il,p) Insert(L,p) is called with the assumption that L)/- p.li the root is empty then 
we make p as the root and stop. Otherwise, we pick one element uniformly at random from the 
set Sr = {j G {1,..., k} I hj{r) ^ 0{r,p)} and recursively call lNSERT(j(L),p). 

of correctness. It is easy to verify that insert procedure maintains the properties of HST given 
in definition 2. □ 

Although the insert procedure is itself quite simple, it is important that we understand the 
random choices it makes before moving further. These observation will be crucial to our analysis 
later. Let the current height of Lhe hr. By L* we denote the complete HST of height h^, clearly 
L* has nodes. We color edges of L* red if both of the nodes it is incident to are present in 
L, otherwise we color it blue. Unlike Above, we can imagine that the Insert procedure works 
with L* instead of L. Upon reaching a node r in L* the procedure samples uniformly at random 
from the set Sr as above. This set may contain edges of either color. If a blue edge have been 
sampled then we stop and insert p into the empty node incident to the blue edge in L. So we 
see that, despite not being in L, the nodes incident to blue edges effect the sampling probability 
equally. 


4.2 Runtime Analysis 

Here we compute ta{vj) and ti{w) in expectation over the random order P and the internal 
randomness of the Insert(L,p) procedure. From the discussion in section 4.1 we clearly see that 
ti(w) = 0{ta{w)). So it suffices to upper bound taiw) in expectation. Furthermore, we only need 
to consider the case when Above(L,p) returns L )/- p as the other case would take fewer number 
of comparisons. Let this time be u(w). We divide our derivations to compute u(w) into two main 
steps: 

i. Compute the expected number of nodes at depth d of L having w nodes. 

ii. Use that to put an upper bound on the number of nodes visited during a call to Above(L, p) 
(when L )/- p). 


We choose to process points according to T as detailed earlier. We denote this ordering by 
the ordered sequence (pi, ...,p„). 

Lemma 3. For any two points p, q where p precedes q inT we have the probability thatp[j] > q[j] 
is pi{k) = 1 — Additionally, if p and q are incomparable then it is rj 2 {k) = 1 — \ 

Proof. See appendix. □ 

Theorem 2. After w insertions the expected number of nodes at depth d in L is given by: 
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Proof. Let X^j^d be the number of nodes at depth d oi L after w insertions. Due to the second 
assertion of Lemma 3 we know that any new point to be inserted can belong to any of the k 
half-spaces with probability rj 2 {k), which is constant over the half-spaces. The insert procedure 
selects one of these candidate half-spaces uniformly at random. Thus it follows from symmetry 
that a particular half-space will be chosen for insertion with probability If the subtree is 
non-empty then we do these recursively. We define an indicator random variable for the event 
that the insertion adds a node at depth d as It,d- Then, 

W 

Xw.d — ^ ^ kt d 

i=l 


Taking expectation on both side we get, 


= ^Pr[/t,d] 

t=i 


Trivially, = 1 for t > 0. When d = 1 and t > 2 then Pr[/t_i] = 1 — ■ This is 

because there are W-i,i nodes at depth 1 (nodes directly connected to the root) hence there are 
k — W- 1,1 empty slots for the node to get inserted at depth 1, otherwise it will be recursively 
inserted to some deeper node. Hence we have, 

t=2 ^ ^ 

For d = 2, we can similarly argue that the probability of insertion at depth 2 for some t > 3 
is equal to probability of reaching a node at depth 1 times the probability of being inserted at 
depth 2. It is not difficult to see that this equals: f—~ 


E[X^,2] 



Xt-1,2 \ 

kXt-1^1) 


Proceeding in this way we see that. 





E[W-l,d] 

1? 


Here we again take expectation on both sides and simplify the expression so that the sum starts 
from t = 1 since the terms E[W,d] = 0 when t < d. 

Let a{w,d) = E[Xtt,,£i], we can then simplify the above equation to get the following recur¬ 
rence, 

, ,, a(w —l,fi—1) / 

j 

with a{w, d) = 0 ioT w < d. The solution to this can be found by choosing a ordinary generating 
function Gd{z) with parameter d, such that Gd{z) = d)z*. The solution [see appendix] 

completes the proof of the theorem. 

□ 


Before moving on to the main theorem we need another lemma: 


Lemma 4. If B = {bo,bi,..., 

S = YTi=o < El=o + 


bn) is a sequence such that br > br+i 
where m < 1. 


> > b, 


then the sum 


Proof. See appendix. 

Corollary 2. If m = \ — \ and k > 4 then , S < Xi=o + l&r-i-iw’'. 


□ 
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Theorem 3. Expected number of nodes visited during an unsuccessful search u{w) is bounded 
by O . 

Proof. Before proving this we make the following observation. If for any d = do, the sequence 
a{w,d) becomes decreasing, that is, a{w,do) > a{w,do — 1) and a{w,do) > a{w,do + 1), then 
afterwards it will stay decreasing. This is clear from the fact that a{w, d) represents the expected 
number of nodes at depth d after w insertions. So the sequence a{w, d) is unimodal since a{w, 0) < 
a{w, 1) trivially for w > 2. Let do be the value that maximizes a{w,d). 

Let us compute the probability of visiting a node at depth d during a call to Above when the 
query point is not below L. Let q be the current node being checked and p be the query point. 
According to Lemma 3 the probability Pr[p G Hj(q)] is same for any j and is not dependent 
on the rank of q in T. Hence it is also not dependent on the depth of q in L. Furthermore, this 
probability is ryi = 1 — again from Lemma 3. 

Thus the probability of visiting a node at depth d is the result of d independent moves each 
having probability rji, hence it is 77 ^^. Now we can find the expression for the expected number 
of nodes visited: 

W—1 

u{w) = ^ T]fa{w,d) 

d^O 

< X! dia{w, d) + -vf°a{w, do + 1) 

d^O 

( 1 ) 

Here we use Theorem 2, Lemma 4 and its corollary and the fact that the sequence a{w,d) is 
unimodal; to bound u{w). Also note that a{w,d) < k‘^. Now we need to upper bound do- With 
some tedious algebra [see appendix] we get, do < log^, w + 2. Again, after some more algebra [see 
appendix] we finally get, 


u 




( 2 ) 


This proves Theorem 3. □ 

Corollary 3. The algorithm runs in O (i+'^) logn^ in expectation. 

Proof. From Theorem 1 and the first paragraph of Section 4.2 we see that the runtime of the 
algorithm is 0{knu(w) log h). Since computing 0(p, q) between pairs of vectors takes 0{k) time. 
Using the upper-bound of u{w) and the fact that w,h < n we get the runtime as claimed above. 

□ 


5 Extension to Arbitrary P 

The previous algorithm would still be correct if P is not a random order. However the expected 
runtime will no longer hold. In order make our previous analysis work for any set of points we 
modify the way we store the layers. In this new setting layers are still arranged using a balanced 
binary B, exactly as before. However, each layer is now stored using a list of HSTs instead of 
just a single one. Let us call this data structure List-HST. We extend the Above and Insert 
procedure for HST in a obvious way. 

List-HST starts with an empty list. Attached to a List-HST is another list R in which 
newly arrived points are kept temporarily before they are ready to be inserted in the List-HST. 
Initially this list R is also empty. We take the maximum size of R as ^/w. As long as R has less 
than points we keep adding to it. Once R has been filled, we create an HST from the points 
in R and remove these points from R. This becomes the first HST in the list. We repeat these 
steps again when R is full. We describe the modified List-HST-Above and List-HST-Insert 
below. 
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List-HST-Insert(p) If R is not yet full then we just add p to R. Otherwise we create HST 
from points in P' = RD p. We randomly permute elements in P' and pick the first element in 
this ordering as the root. We then build the HST iteratively by picking elements in this order. 
Next we prove a lemma similar to Lemma 3. 

Lemma 5. Assume that an HST is build by inserting points in a random order. Let X represents 
a point which is already inserted and Y a new point being compared to X. Then Y belongs to 
any of the children subtree of X with egual probability. 

Proof. Since the insertion order is random, X and Y are both random variables. We compute the 
probability Pr[X[i] > Y[i\\ for some i. Now for some arbitrary pair of points {p, q} the probability 
that p precedes q in the ordering is 1/2. li X,Y G {p,q} then, Pr[X[i] > Y[i\\X,Y G {p,q}] = 
(Pr[X[i] > = p,Y = g] + Pr[X[i] > = q^Y = p\)/2. Pr[p[i] > ( 7 [i]]is either 0 or 1 

since the points p and q themselves are not random. Hence, Pr[X[i] > y[i]|X, Y G {p, g}] = 1/2 
for any pair {p, q}, which proves the claim above. □ 

Using the above lemma and techniques used to prove Theorem 3 we can show that do < 
logfc ^/w + 2. Hence, building an HST takes 0{^/w\ogf. w) time in expectation and 0{w) in worst 
case. Since, there can only be ^/n such steps where we build an HST and each takes 0{n) (since 
w <n) time hence the insert operation on List-HST adds to the overall running time 

of our algorithm. This is insignificant compared to the total time. 

List-HST- Above (p) For each HST L in the list we call Above(P,p). If none of these calls 
find a point above p then we check the remaining points in R. However, since p is not random 
we cant compute the probability rji as we did before. However, we can upper bound the fraction 
of subtrees that are visited from a node. We see that a point p can visit at most k — 1 subtrees 
of a node q otherwise we can conclude that q Y p. The List-HST-Insert procedure creates the 

subtree of q with equal probability for all j. Hence, during the search step the point p will 
visit a non-empty subtree of q is with probability < (k — l)/k. This value can be substituted as 
an upper bound for rji in Equation 1, which leads to u{w) < 0(k^/w'°^’‘ Since there are 

at most ^/w HSTs in a layers, it takes time in expectation to search a 

list of HSTs. We ignore the time it takes to check the set R, which is 0{ky/w). Hence the total 
runtime is bounded by logn) in expectation. 


5.1 A Summary of Results 

We summarize the main results as follows: 

i. k is a constant. From Corollary 3 we can easily verify that the algorithm has a runtime of 

where 5{k) > 0. This remains true even when P is not a random order. 

ii. k is some function of n We let k = f(n). For any k the runtime of our algorithm is bounded 
by 0{kn^) in the worst case. This bound does not hold for the divide-and-conquer algorithm 
in [3]. Also, The proposed algorithm never admits a quasi-polynomial runtime unlike any of 
the previously proposed non-trivial algorithms. 

Concluding Remarks 

In this paper we proposed a randomized algorithm for the MaxLayers(P) problem. Unlike pre¬ 
vious authors we also consider the case when k is not a constant; this is often the case for many 
real-world data sets whose tuple dimensions are not insignificant with respect to its set size. In 
this setting we show that the expected runtime of our algorithm is O (i+fc+r) logn^ 

when P is a random order. For any arbitrary set of points in it exhibits a runtime of 
log n) in expectation. It remains to be seen if there exists a determinist 
algorithm that runs in o(fcn^) for this problem. As a future work it would be interesting to know 
whether HST can be used for the unordered convex layers problem in higher dimensions. We 
know that unlike the maximal layers problem this problem is not decomposable [5] . So it would 
be interesting to know within our iterative framework whether we can extend HST to store the 
convex layers also. 
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Appendix 

Proof of Lemma 3 

Proof. Recall that T is a linear extension of P. Since p precedes q in T, fi{p) > p{q). Hence, 

3 j G {1,..., k} such that p[j'] > q[j']. Let j' = argmax^^^^;.p[j]. We compute the probability 
> q[j\ I p{p) > p{q)] in two parts over the disjoint sets {j = j } and {j j }: 


Pr[p[j] > <l[j] I Kp) > Pi<l)] = Pr[p[i] > q[j] \ j =j ,/x(p) > p{q)] Pr[j = j 

+ Pr[p[i] > q[j] I i -h j\p{p) > p{q)\ Pr[j ^ 3] 

2pi{p)j\ k. 


1 


Since, 


(3) 


Pi'[pb1 > 9b1 \j^3 ,p{p) > p{q)] = 


(pip) - p{q))p{q) 
p{p)p{q) 




= 1 - 


2p{p) 


This follows from the fact that p[j] and q[j] are independent random variables uniformly dis¬ 
tributed over [0,^(p)] and [0,^(q)] (given fi{p) > p{q)) respectively. In the set {j = j } clearly 

p[j] > q[j]. However, in the set {j j } the probability that p[j] > q[j] is ^1 — We 

note that p{p),fi{q) are themselves random variables. More importantly they are i.i.d random 
variables having the following distribution; 

Pt[p{p) <t]=t>^ 


on the interval [0,1]. This follows from how points in P are constructed. We take the expectation 
of both side of Equation 3.1 over the event space generated by p{p), p{q) on the set {p{p) > p{q)}. 


= 1 

= 1 




p{p) > p{q) 


1 - 


1 -; = 1 -; 


1 
k 

k-1 


2 \ k + l 


Pr[p[j] > q[j] I p{p) > fJ.{q)] 
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Now, let us compute, E 
tribution function of ^{p) is. 


I > p{q) . Recall that p{p) = maxi<i<kp[j]. So the dis- 


F[Kp)] = < t] = Pr 


A pW < ^ 


l<i<k 


= n Pi'bw <t]=t'' 


l<i<k 


Where the second equality comes from that fact that each component of p are independent and 
identically distributed on [0,1] with uniform probability. Hence, 


E 


Pi<l) 


pip) 


pip) > pip) 


f Pjp) 

'k‘(p)>k‘{P} pip) 


dF[p{p)]dF[p{q)] 

p 1 fnip) 


Prlpip) > p{q)] Jq Jq 

- ^ 
fc + 1 

A similar argument can be used to prove the second claim. 


pip) pip) dp{p)dp,{p) 


□ 


Solving a{w,d) 

To simplify our calculations we modify the recurrence slightly: With a{w,d) = k'^b{w,d), the 
recurrence equation becomes, 

^) = - 1, d - 1) + (1 - ■^)Kw - 1, d) 

Let, Gdiz) = d)z^. We note that b{w, d) = 0 when w < d. Then we have, 

Gdiz) = -j^Gd-iiz) + z(l - -j^)Gdiz) 


= ^^-Go(z) 

But, Go(z) = 0 ) 2 :“ = ^i'^' 0) = 1 when w > 1. Hence, 

b{w,d) = 

^ ^-d(d+l)/2r_^»-d-lj- - 1 - 

Where the notation [z'^]p{z) means the coefficient of z* in the polynomial p{z) as usual. Using 
partial fractions: Let, 


_1_ _ /3o _ 

For which we get the following solution, 

/3o = 

_ /cAd+l)/2(l _ k-i f 


_ 

(1 - k-^)z) 


Substituting these in Equation 4 above we get, b{w, d) 
us the desired result for a{w,d). 


1-Eti 






which gives 
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Proof of Lemma 4 

We have, S = where br > 6r+i > > bn- But then, 

r n r n 


S = ^ biiTi^ + ^ ^ 6im* + ^ 6r+iw* 

2=0 2=r+l 2=0 2=r+l 


= ^ + 

1=0 


br+im/"^^ 
I — m 


Computing do 

We shall denote do as do{w,k) as it is a function of both w and k. Hence, do(w,k) maximizes 
a{w,d) as d varies from 0 to w — 1. Since we are interested in an upper bound on do{w,k), 
we may think of d being fixed and we vary w from 0 to cxi. If we then lower bound w, when 
a{w, d) maximum, we will be able get a corresponding upper bound for do(w, k). This makes our 
analysis simpler as the number of terms in the expression for a{w,d) is fixed for a fixed d. 

Let, 


a'{w, d) = a{w, d — 1) — a{w, d) 

( d-l 

= 








d 

E- 


(l-ir) 


1 \W—1 


i=l (1 


Letting = 1 — we get. 


d 

a'{w, d) = —k^~^{k — 1) + k‘^~^ ^ 

i=l 


aj'^ ^{k-a^-d) 


Since we wish to compute do{w, k) or at least get an upper bound, we assume that a'{w, d) < 0. 
Hence, 


Y^d 

2=1 ^i-j 


< k - 1 


Since, 0^=1 iij= 

Let A = kY,t^i 


-j = nE nU (1 - k^) = nir 


and B = J2i=i 


j=i 

u- 1 ^ 


--1 p(z-i)nj=i ^ rijt 

assumption, A + B < k — 1. 

However, writing out the terms in the expression for A yields: 


(1 - k^), where, P{i) = n}=i «*- 
Then according to our 


A = k 


= k 


P{d-1) 

p(d-i) 


> k 


a. 


p{d-i) 


w — l w — l \ 

^d-1 _ ^d-2 _\ 

(fc - l)P(d -2) {k- l)(fc2 - l)P(d - 3) 'j 

^W — 1 \ 

^d-1 _I / J_N _ \ 

{k-l)P{d-2) ’j 

«d-i \ 

{k-l)P{d-2) j 


It is not difficult to see that < o(^)- Since, P{i) < P{j) when i > j and 

decreases as z —l 0. Similarly, we can show that. 


B> - 


a 


W — l 

'd-l 


P{d-2) 


actually from d + 1 as terms below it are 0, but this does not affect our analysis 
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Thus we get, 


w — l — 1 


P{d-l) P{d-2) \ k-1 


1 + 


< fc — 1 


Since we want to get an upper bound for do{w, k) we assume that w is sufficiently large. More 
precisely, we let w > ck‘^~^ + 1 where 0 < c < 1. But then. 


1 


k'^~ 

3 


1 


«d-i f.d-1 

< e 




Here we take c = Putting this value for in the main equation and dividing both sides 

by k we get. 


< 1- T + 


1 3 2fc - 1 


P(d-l) k 5k{k-l)P{d-l) 

Since P{d — 1) < P{d — 2). But, 

P(d-l)>PM=fl-l') fl-1) 


k‘^~ 


_ 1 1 1 1 1 ill 

“ ~ T. ~ Ta 

tx tx tx rv tx lx 

And P{d — 1) < 1 — Substituting these upper and lower bound of P((i — 1) to LHS and RHS 
of the expression respectively we get, 

, W — l 


1 3 (2fc-l)fc ^ 7(fe) 

fc—1 ' k 5 {k — l){k'^ — k — 1) ~ k 


k , < 


Where 0 < 'y{k) < 1 for fc > 4. So we have. 

So we get, d < log^. w + 2. This is the upper bound on do{w, k) that we have sought for. 


Derivation of Equation 3 in Theorem 3 

We know do < (3 + i. Where, /3 = [log^ wj. Hence (for fc > 4), 


W — l 

u{w) = ^ T]f a{w,d) 

d—0 

- X! dia{w, d) + -r]f°a{w, do + 1) 
d^O 

^ 7 

< X! + 1) + a{w,l3 + 2) + a{w,l3 + 3)) + a{w, 13 + 4) 

d=0 

Here we observe that, a{w,(3 + z) < w for 1 > z > 4. Substituting these bonnds we get. 


u(w) < 


(zyifc)^+^ — 1 
zyifc — 1 


I ^~\~1 

+ ciwrji 
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Were ci < 5 is a constant. Thus, 


i{w) < 


ivikfimk) 

rjik — 1 


/3+1 

Ciwrj'l 


< C2{r]ik)^ + ciwri^~^^ 

< C2 (?7i fc) 

< C2(r7ifc)‘°®'““ + CiVi°®"” 

< (ci -\- 02)1107] < C^W^~ *= fc+i ^ 


Here, C2 < and C3 are constants. 



